Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
@streamparser/json
Advanced tools
Streaming JSON parser in Javascript for Node.js, Deno and the browser
@streamparser/json is a streaming JSON parser for Node.js that allows you to process JSON data as it is being received, rather than waiting for the entire JSON payload to be available. This can be particularly useful for handling large JSON files or streams of JSON data from APIs.
Streaming JSON Parsing
This feature allows you to parse JSON data as it is being received. The `onValue` event is triggered whenever a complete JSON value is parsed.
const { parser } = require('@streamparser/json');
const jsonParser = new parser();
jsonParser.onValue = (value) => {
console.log('Parsed value:', value);
};
jsonParser.write('{"key": "value"}');
jsonParser.end();
Handling Large JSON Files
This feature demonstrates how to handle large JSON files by streaming the file content and parsing it chunk by chunk. This avoids loading the entire file into memory.
const fs = require('fs');
const { parser } = require('@streamparser/json');
const jsonParser = new parser();
jsonParser.onValue = (value) => {
console.log('Parsed value:', value);
};
const readStream = fs.createReadStream('largeFile.json');
readStream.on('data', (chunk) => {
jsonParser.write(chunk);
});
readStream.on('end', () => {
jsonParser.end();
});
Streaming JSON from an API
This feature shows how to parse streaming JSON data from an API. The JSON data is processed as it is received, making it efficient for handling large or continuous streams of data.
const https = require('https');
const { parser } = require('@streamparser/json');
const jsonParser = new parser();
jsonParser.onValue = (value) => {
console.log('Parsed value:', value);
};
https.get('https://api.example.com/streaming-json', (res) => {
res.on('data', (chunk) => {
jsonParser.write(chunk);
});
res.on('end', () => {
jsonParser.end();
});
});
JSONStream is a streaming JSON parser that allows you to parse large JSON files or streams of JSON data. It provides a similar functionality to @streamparser/json but with a different API. JSONStream uses a pattern-matching approach to filter and process JSON data.
stream-json is another streaming JSON parser that provides tools to work with large JSON datasets. It offers a modular approach, allowing you to pick and choose the components you need for parsing, filtering, and processing JSON data. It is comparable to @streamparser/json in terms of functionality but offers more flexibility with its modular design.
oboe is a streaming JSON parser that focuses on making it easy to work with JSON APIs. It allows you to parse JSON data as it is received and provides a simple API for handling different parts of the JSON structure. Oboe is similar to @streamparser/json but is more focused on ease of use and working with JSON APIs.
Fast dependency-free library to parse a JSON stream using utf-8 encoding in Node.js, Deno or any modern browser. Fully compliant with the JSON spec and JSON.parse(...)
.
tldr;
import { JSONParser } from '@streamparser/json';
const parser = new JSONParser();
parser.onValue = ({ value }) => { /* process data */ };
// Or passing the stream in several chunks
try {
parser.write('{ "test": ["a"] }');
// onValue will be called 3 times:
// "a"
// ["a"]
// { test: ["a"] }
} catch (err) {
console.log(err); // handler errors
}
There are multiple flavours of @streamparser:
@streamparser/json
into a WHATWG TransformStream.@streamparser/json
into a node Transform stream.@streamparser/json requires a few ES6 classes:
If you are targeting browsers or systems in which these might be missing, you need to polyfil them.
A JSON compliant tokenizer that parses a utf-8 stream into JSON tokens
import { Tokenizer } from '@streamparser/json';
const tokenizer = new Tokenizer(opts);
The available options are:
{
stringBufferSize: <number>, // set to 0 to don't buffer. Min valid value is 4.
numberBufferSize: <number>, // set to 0 to don't buffer.
separator: <string>, // separator between object. For example `\n` for nd-js.
}
If buffer sizes are set to anything else than zero, instead of using a string to apppend the data as it comes in, the data is buffered using a TypedArray. A reasonable size could be 64 * 1024
(64 KB).
When parsing strings or numbers, the parser needs to gather the data in-memory until the whole value is ready.
Strings are inmutable in Javascript so every string operation creates a new string. The V8 engine, behind Node, Deno and most modern browsers, performs a many different types of optimization. One of this optimizations is to over-allocate memory when it detects many string concatenations. This increases significatly the memory consumption and can easily exhaust your memory when parsing JSON containing very large strings or numbers. For those cases, the parser can buffer the characters using a TypedArray. This requires encoding/decoding from/to the buffer into an actual string once the value is ready. This is done using the TextEncoder
and TextDecoder
APIs. Unfortunately, these APIs creates a significant overhead when the strings are small so should be used only when strictly necessary.
Number(numberStr)
but the user can override it if he wants some other behaviour.// You can override the overridable methods by creating your own class extending Tokenizer
class MyTokenizer extends Tokenizer {
parseNumber(numberStr) {
const number = super.parseNumber(numberStr);
// if number is too large. Just keep the string.
return Number.isFinite(numberStr) ? number : numberStr;
}
onToken({ token, value }) {
if (token = TokenTypes.NUMBER && typeof value === 'string') {
super(TokenTypes.STRING, value);
} else {
super(token, value);
}
}
}
const myTokenizer = new MyTokenizer();
// or just overriding it
const tokenizer = new Tokenizer();
tokenizer.parseNumber = (numberStr) => { ... };
tokenizer.onToken = ({ token, value, offset }) => { ... };
A token parser that processes JSON tokens as emitted by the Tokenizer
and emits JSON values/objects.
import { TokenParser} from '@streamparser/json';
const tokenParser = new TokenParser(opts);
The available options are:
{
paths: <string[]>,
keepStack: <boolean>, // whether to keep all the properties in the stack
separator: <string>, // separator between object. For example `\n` for nd-js. If left empty or set to undefined, the token parser will end after parsing the first object. To parse multiple object without any delimiter just set it to the empty string `''`.
}
undefined
which emits everything. The paths are intended to suppot jsonpath although at the time being it only supports the root object selector ($
) and subproperties selectors including wildcards ($.a
, $.*
, $.a.b
, , $.*.b
, etc).true
. When set to false
the it does preserve properties in the parent object some ancestor will be emitted. This means that the parent object passed to the onValue
function will be empty, which doesn't reflect the truth, but it's more memory-efficient.// You can override the overridable methods by creating your own class extending Tokenizer
class MyTokenParser extends TokenParser {
onValue(value: any) {
// ...
}
}
const myTokenParser = new MyTokenParser();
// or just overriding it
const tokenParser = new TokenParser();
tokenParser.onValue = (value) => { ... };
A drop-in replacement of JSONparse
(with few breaking changes improvements. See below.).
import { JSONParser } from '@streamparser/json';
const parser = new JSONParser();
It takes the same options as the tokenizer.
This class is just for convenience. In reality, it simply connects the tokenizer and the parser:
const tokenizer = new Tokenizer(opts);
const tokenParser = new TokenParser();
tokenizer.onToken = tokenParser.write.bind(tokenParser);
tokenParser.onValue = (value) => { /* Process values */ }
// You can override the overridable methods by creating your own class extending Tokenizer
class MyJsonParser extends JSONParser {
onToken(value: any) {
// ...
}
onValue(value: any) {
// ...
}
}
const myJsonParser = new MyJsonParser();
// or just overriding it
const jsonParser = new JSONParser();
jsonParser.onToken = (token, value, offset) => { ... };
jsonParser.onValue = (value) => { ... };
You can use both components independently as
const tokenizer = new Tokenizer(opts);
const tokenParser = new TokenParser();
tokenizer.onToken = tokenParser.write.bind(tokenParser);
You push data using the write
method which takes a string or an array-like object.
You can subscribe to the resulting data using the
import { JSONParser } from '@streamparser/json';
const parser = new JSONParser({ stringBufferSize: undefined, paths: ['$'] });
parser.onValue = console.log;
parser.write('"Hello world!"'); // logs "Hello world!"
// Or passing the stream in several chunks
parser.write('"');
parser.write('Hello');
parser.write(' ');
parser.write('world!');
parser.write('"');// logs "Hello world!"
Write is always a synchronous operation so any error during the parsing of the stream will be thrown during the write operation. After an error, the parser can't continue parsing.
import { JSONParser } from '@streamparser/json';
const parser = new JSONParser({ stringBufferSize: undefined });
parser.onValue = console.log;
try {
parser.write('"""');
} catch (err) {
console.log(err); // logs
}
You can also handle errors using callbacks:
import { JSONParser } from '@streamparser/json';
const parser = new JSONParser({ stringBufferSize: undefined });
parser.onValue = console.log;
parser.onError = console.error;
parser.write('"""');
Imagine an endpoint that send a large amount of JSON objects one after the other ({"id":1}{"id":2}{"id":3}...
).
import { JSONParser} from '@streamparser/json';
const parser = new JSONParser();
parser.onValue = (value, key, parent, stack) => {
if (stack > 0) return; // ignore inner values
// TODO process element
};
const response = await fetch('http://example.com/');
const reader = response.body.getReader();
while(true) {
const { done, value } = await reader.read();
if (done) break;
jsonparser.write(value);
}
Imagine an endpoint that send a large amount of JSON objects one after the other ([{"id":1},{"id":2},{"id":3},...]
).
import { JSONParser } from '@streamparser/json';
const jsonparser = new JSONParser({ stringBufferSize: undefined, paths: ['$.*'] });
jsonparser.onValue = ({ value, key, parent, stack }) => {
// TODO process element
};
const response = await fetch('http://example.com/');
const reader = response.body.getReader();
while(true) {
const { done, value } = await reader.read();
if (done) break;
jsonparser.write(value);
}
The arguments of callbacks have been objectified.
What used to be
jsonparser.onToken = ({ token, value }) => {
// TODO process token
};
jsonparser.onValue = ({ value, key, parent, stack }) => {
// TODO process element
};
now is:
jsonparser.onToken = (token, value) => {
// TODO process token
};
jsonparser.onValue = (value, key, parent, stack) => {
// TODO process element
};
See [LICENSE.md].
FAQs
Streaming JSON parser in Javascript for Node.js, Deno and the browser
The npm package @streamparser/json receives a total of 375,666 weekly downloads. As such, @streamparser/json popularity was classified as popular.
We found that @streamparser/json demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.